Picture for Quan Kong

Quan Kong

Woven by Toyota, Inc., Tokyo, Japan

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering

Add code
May 22, 2026
Viaarxiv icon

Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding

Add code
May 19, 2026
Viaarxiv icon

InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding

Add code
Apr 09, 2026
Viaarxiv icon

ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding

Add code
Mar 23, 2026
Viaarxiv icon

Vision-TTT: Efficient and Expressive Visual Representation Learning with Test-Time Training

Add code
Feb 28, 2026
Viaarxiv icon

TrajTok: Learning Trajectory Tokens enables better Video Understanding

Add code
Feb 26, 2026
Viaarxiv icon

Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection

Add code
Aug 08, 2025
Viaarxiv icon

Synthetic Visual Genome

Add code
Jun 09, 2025
Viaarxiv icon

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

Add code
May 29, 2025
Viaarxiv icon

Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras

Add code
May 23, 2025
Viaarxiv icon